Comparative study of several distortion measures for speech recognition
نویسندگان
چکیده
In this study we compared several different spectral distortion measures including the Itakura-Saito (IS), the log likelihood ratio (LLR), the likelihood ratio (LR), the cepstral (CEP), and two perceptually based distortion measures, the weighted likelihood ratio (WLR) and the weighted slope metric (WSM) distortion measures, in terms of their effects on the performance of a standard dynamic time warping (DTW) based, isolated word, speech recognizer. Two modifications of the basic forms of each measure were also investigated, namely a Bark-scale frequency warping and the incorporation of surasegmental energy information. All distortion measures and their modifications were tested on an alpha-digit vocabulary, 4-talker, telephone recording data base. The results can be summarized as: (1) All LPC-based distortion measures performed reasonably well. The LLR and WSM distortion measures gave the highest recognition accuracy, while the IS distortion measure gave the lowest score; (2) Whereas the addition of suprasegmental energy information helped the recognition performance, the use of gain and absolute loudness degraded the performance; (3) Bark-scale frequency warping did not perform as well as its unwarped counterpart; (4) The WLR distortion measure did not perform as well as its unweighted counterpart. L Introduction Since it was first ietroduced, the Itakura-Saito distortion measure 1] has played a key role in speech coding, analysis, synthesis and recognition. Several studies were conducted to investigate the relationship between different LPC-based distortion measures and to study their propertis from a theoretical point of view 12,31. It is the goal of this research to compare several basic distortion measures (including two recently proposed, perceptually based measures [4,51) and to study their influence on the performance of an isolated word, DTW based, speech recognition system. We also tested two modifications of the basic distortion measures: Bark-scale frequency warping of the LPC-derived distortion measure, and incorporation of suprasegmental energy information. II. Spectral Distortion Measures 2.1 Itakura-Saito Distortion Measure The maximum likelihood distortion measure, also known as the Itakura-Saito distortion measure, was first used for short-time spectral estimation of speech signals. The measure, denoted as dis, is: dis(S,f)f [-+lnj_l] e where Si,, (A) is the short-time spectral density (or periodograni) of an input speech signal, and f (A) a a2 -Ii + a,e" + + ae"l2 — Al2 is the spectral density function of a corresponding pth-order all-pole model. Defining d as the log spectral distance between S,, (A) and f(T), at frequency A, i.e. 2.2 The Log Likelihood Ratio (LLR) and the Likelihood Ratio (LR) Distortion Measures TRa1 dLLR Ifln d15(f,f3f') = in a
منابع مشابه
A Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملOn the use of bandpass liftering in speech recognition
Alstract-In a template-based speech recognition system, distortion measures that compute the distance or dissimilarity between two spectral representations have a strong influence on the performance of the recognizer. Accordingly, extensive comparative studies have been conducted to determine good distortion measures for improved recognition accuracy. Previous studies have shown that the log li...
متن کاملSeveral Hku Approaches for Robust Sp Evaluation on Aurora Connected
Recently, we, at The University of Hong Kong (HKU) have proposed several approaches based on stochastic vector mapping and switching linear Gaussian HMMs to compensate for environmental distortions in robust speech recognition. In this paper, we present a comparative study of these algorithms and report results of performance evaluation on Aurora connected digits databases. By following the pro...
متن کاملPerceptual Significance of Cepstral Distortion Measures in Digital Speech Processing
Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are d...
متن کاملComparative Study of Speech Recognition System Using Various Feature Extraction Techniques
It is very important to detect the speech endpoints accurately in speech recognition. This paper presents a comparative analysis of various feature extraction techniques of endpoint detection in speech recognition of isolated words in noisy environments. The endpoint detection problem is nontrivial for no stationary backgrounds where artifacts (i.e., no speech events) may be introduced by the s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 4 شماره
صفحات -
تاریخ انتشار 1985